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@ Microprocessors load/store functional units and data caches. 

(sf\ A load/store functional unit and a con-e- 
^ sDonding data cache of a superscalar m.crop- 
mceio? is disclosed. The load/store functional 
C^iMndudes a plurality of reservation station 
entries which are accessed in parallel and 
whTch are coupled to the data cache in parallel 
The load/store functional unit f so '"^"^es a 
store buffer circuit having a plurality of store 
bX entries. The store buffer entnes 
orqanized to provide a first in first out buffer 
whire Uie ou^juts from less significant entr.es 
of the buffer lire provided as inputs to more 
significant entries of the buffer. 
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The present invention relates to microproces- 
sors, and. more particularly, to providing micropro- 
cessors w.th high performance data caches and 
load/store functional units. 

Microprocessors are processors which are imple- 5 
mented on one or a very small number of semicon- 
ductor ch.ps. Semiconductor chip technology is ever 
increasing the circuit densities and speeds within mi- 
croprocessors; however, the interconnection be- 
tween the microprocessor and external memory is ,0 
constramed by packaging technology. Though on- 
chip interconnections are extremely cheap, off-chip 
connections are very expensive. Any technique in- 
ended to improve microprocessor performance must 
take advantage of increasing circuit densities and 15 
speeds while remaining within the constraints of 
packaging technology and the physical separation 
between the processor and its external memory. 
While increasing circuit densities provide a path to ev- 
ermore complex designs, the operation of the micro- 20 
processor must remain simple and clear for users to 
understand how to use the microprocessor 

While the majority of existing microprocessors 
are targeted toward scalar computation, superscalar 
microprocessors are the next logical step in the evo- 25 
lution of microprocessors. The term superscalar de- 
scnbes a computer implementation that improves 
performance by a concun-ent execution of scalar in- 
structions. Scalar instructions are the type of instruc- 
tions ypically found in general purpose microproces- 30 
sors. Using today's semiconductor processing tech- 
nology, a single processor chip can incorporate high 
performance techniques that were once applicable 
only to large-scale scientific processors. However 
many of the techniques applied to large scale proces ' 35 
sors are either inappropriate for scalar computation or 
too expensive to be applied to microprocessors 

A microprocessor runs application programs. An 
application program comprises a group of instruc- 
tions. In running the application program, the proces- ,0 
sor fetches and executes the instructions in some se- 
quence. There are several steps involved in the exe- 
cuting even a single instruction, including fetching the 
mstruction. decoding it. assembling its operands 
performing the operations specified by the instruc- 45 
tion. and writing the results of the instruction to stor- 
age. The execution of instructions is controlled by a 
periodic clock signal. The period of the clock signal is 
the processor cyde time. 

The time taken by a processor to complete a pro- 50 
gram is determined by three factors: the number of in- 
structions required to execute the program; the aver- 
age number of processor cycles required to execute 
an instruction: and. the processor cycle time. Proces- 
tZ.T^Tl^T ""P™^^^ ^^^"^'"9 *he time 55 
factors'" ^""^^^^ '^'''"''"^ °' "^'^ °^ 

One way to improve performance of the micro- 



processor IS by overlapping the steps of different in- 
structions using a technique called pipelining. To pi- 
pehne instructions, the various steps of insLction 
execution are performed by independent units called 

ZT". separated bj 

clocked registers. The steps of different instructions 
nlT^'T^ independently in different pipeline sta- 
ges. Pipelining reduces the average number of cycles 

[al'amn? 1 T.''"'' to- 
tal amoun of time required to execute an instruction 
by permitting the processor to handle more than one 
mstruction at a time. This is done without increasing 
he processor cyde time appreciably. Pipelining ty^ 
.cally reduces the average number of cydes per in^ 
struction by as much as a factor of three. However 
when executing a branch instruction, the pipeline 
may sometimes stall until the result of the branch op- 
eration IS known and the correct instruction is fetched 
for execution. This delay is known as the branch- 
delay penalty. Increasing the number of pipeline sta- 
ges also typically increases the branch-delay penalty 
relatn^e to the average number of cydes per instruc- 

Another way to Improve processor performance 
IS to increase the speed with which the microproces- 
sor assembles the operands of an instruction and 
writes the results of the instruction; these functions 

Both nf ' ^ respectively. 

Both of these functions depend upon the micropro- ^ 
cessor s use of its data cache. 

During the development of early microproces- 
sors instructions took a long time to fetch compared 
to the execution time. This motivated the develop- 
ment of complex instruction set computer (CISC) 
processors. CISC processors were based on the ob- 
servation that given the available technology the 
numljer of cydes per instruction was determined 
mostly by the number of cydes taken to fetch the in- 
struction. To improve performance, the two principal 
goals of the CISC architecture were to reduce the 
number on instructions needed for a given task and 
to encode these instructions densely. It was accept- 
able to accomplish these goals by increasing the 
average number of cydes taken to decode and exe- 
cute an instruction because using pipelining, the de- 
code and execution cydes could be mostly overlap- 
ped with a relatively lengthy instruction fetch With 
this set of assumptfons. CISC processora evolved 
densely encoded instructions at the expense of de- 
code and execution time inside the processor. Multi- 
Ple-cyde instrudions reduced the overall number of 
instructions and thus reduced the overall execution 
time because they reduced the instruction fetch time - - 

In the late 1970's and early 1980's. memory and 
packaging technology changed rapidly. Memory den- 
sities and speed increased to the point where high 
speed local memories called caches could be imple- 
mented near the processor Caches are used by the 
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processor to temporarily store instruct.ons and date. 
When instructions are fetched more qu.cWy usmg ca- 
ches the performance is limited by the decode and 
exeo. ion time that was previously hidden wrth.n the 
fnst^^ction fetch time. The number of mstrucUons 
dot not affect performance as much as the average 
number of cycles taken to execute an . 

The improvement in memory and Padcag.ng 
technology, to the pointwhere instruction fet^hmg djd 
not take much longer than instruction execuf on mo- 
tivated the development of reduced ,nstruct,on set 
comouter (RISC) processors. To improve perfor- 
mance thi principal goa. of a RISC architecture 's to 
deduce the number of cycles taken to execute and .n- 
Ltruction. allowmg some increase in the total number 
onnstru^tions. The trade-off between cycles per '- 
struction and the number of 

one compared to CISC processors. RISC proces- 
sors typically reduce the number of cycles per m- 
::"c;ionbyLtorsofth.eetofive,wh«ethe^^^^^^^^ 
ly increase the number of instructions by thirty to f ty 
oercent RISC processors rely on auxiliary features 
such as a large number of general purpose registers, 
and inst^ction and data caches to help the «>mp.l^r 
reduce the overall instructton count or to help reduce 
the number of cycles per instruction. 

A typical RISC processor executes one instruc- 
tion on every processor cyde. A superscalar proces- 
rreducesLaveragenumberofcydesperins^uc- 

tion beyond what is possible in a pipelined scalar 
RISC processor by allowing concurrent execution of 
instructions in the same pipeline stage as well as con- 
ouient execution of instructions in different pipeline 
stages. The term superscalar emphasizes multiple 
concurrent operations on scalar quantities as d.sUn- 
quished from multiple concurrent operations on vec- 
tors or arrays as is common in scientific computing 

While superscalar processors are conceptually 
simple, there is more to achieving increased perfor- 
nnance than widening a processor's P'P«""«- ^^^"l 
ing the pipeline makes it possible to execute more 
than one instruction per cyde but there is no guaran- 
tee that any given sequence of instructions can take 
advantage of this capability. Instructions are not inde- 
pendent of one another but are interrelated; these in- 
terrelationships prevent some instructions from occu- 
pying the same pipeline stage. Furthermore, the proc- 
essor's mechanisms for decoding and executmg in- 
structions can make a big difference in its ability to 
discover instructions that can be executed at simul- 

'^"^Superscalar techniques largely concern the proc- 
essor organization independent of the instruction set 
and other architectural features. Thus, one of the at- 
tractions of superscalar techniques is the possib.l. y 
of developing a processor that is code compatible 
with an existing architecture, ^any superscalar tech- 
niques apply equally well to either RISC or CISC ar- 



chitectures. However, because of the regularity of 
manv of the RISC architectures, superscalar techni- 
ques have initially been applied to RISC processor 

5 The attributes of the instruction set of a RISC 

processor that lend themselves to singte cyde decod- 
inq also lend themselves will to decoding multiple 
Rise instructions in the same dodc cyde. These in- 
clude a general three operand load/store architec- 
,0 ture. instructions having only a few instruction 
lengths, instructions utilizing only a few addressing 
modes, instructions which operate on fixed-width 
registers and register identifiers in only a few places 
within the instruction format Techniques for design- 
,5 ing a superscalar RISC processor are df seabed in 
"^r^^r M.«onmcessor Design, by William Mh 
chael Johnson. 1991. Prentice-Hall. Inc. (a division of 
Simon & Schuster). Englewood Qiffs. New Jersey. 
In contrast to RISC Architectures. CISC architec- 
20 tures use a large number of different instruction for- 
mats One CISC microprocessor architecture which 
has gained wide-spread acceptance is the x86 archn 
tecture This ardiitecture. first introduced m the 
i386~ microprocessor, is also the basic architecture 
25 of both the 1486™ microprocessor and the Pentium 
microprocessor, all available from the Intel c«rpora- 
«on of Santa Oara. California. The x86 architec ure 
provides for three distinct types of addresses a logn 
cal address, a linear address and a physical address^ 
30 The logical address represents an offset from a 

segment base address. The offset, referred to as the 
effedive address, is based upon the type address- 
ing mode that the microprocessor is using. These ad- 
dressing modes provide different combinations .of 
35 four address elements, a displacement, a base, an in- 
dex and a scale. The segment base address is ac- 
cessed via a seledor. More spedf ically. the seledor. 
which is stored in a segment register, is an index 
which points to a location in a global descriptor teble 
40 (GDT) The GOT location stores the linear address 
corresponding to the segment base address. 

The translation between logical and linear ad- 
dresses depends on whether the microprocessor is in 
Real Mode or Protected Mode. When the micropro- 
cesser is in Real Mode, then a segmentation unit 
shifts ttle relectorleftfour bits and adds the result to 
the offset to form the linear address. When the micro- 
processor is in Protected Mode, then the segmenta- 
tion unit adds the linear base address P°'"ted toJ)y 
50 the selector to the offset to provide the linear address^ 
The physical address is the address which ap- 
pears on the address pins of the microprocessor and 
is used to physically address external memory. The 
physical address does not necessarily correspond to 
55 ?he linear address. If paging is not enabled then the 
32.bit linear address conresponds to the Phys.cal ad- 
dress If paging is enabted. then the linear address 
must be translated into the physical address. Apaging 



EP 0 651 323 A1 



unit performs this translation 

level lable fe a PaL ^^O"" 
«-ea a Z^alT ' 

- Page re«T;E„TrlT;p'''r:"*'''°^ 

Includes a slarting addreT„f a "'^ """" 

to as the nni ™L ! °" l^9« ''ame. rsfe„ed 

=.a.:.S WoSraf^; J,?' 
A1 2 - A21 ,1,. r f^a^- Address bits 

address of the ™n. ,^ " ^lartlng 
address. ''""'^^ss to form the physical 

pan.rri rrTr r„^r • 

and the Penlmm~ „ ^"""^Pn^assor cache 

viaph.s,2"aresse^srvers:ar;«'T'" 

of these processors T functional units 

Of load oZTZ ' "'^•'y 

pluL^^rSiLL"? k" '•^"'^ a 

ple IT ^"^"y^ <l'a»ings. by way o, exam- so 



cnb:^p,';%^:et,rrj"'^'"---"'- 

dala'Sch? ' "''^^ -"V »' '»e F,g. 2 

a»o:t':trhZtr„ir™°--- 
decoder'jr^riTdt^rrr^'r"" 

RISC core 100 Rl=!r- , ™ coupled to 

1'2a,„,reo^d^;bu^e, m as '^'^'^ 

h« u 'oad/store unit 134 fLS<;pr» 

branch sect on 135 mRMQc/-! '•'^ tLbSEC). 

136 (FPU) ^^'''^^^^^•^"d floating point unit 

erlOSandload/storrnftlsTAa "f"'"^^ 
116 are also coupled tTregtt^Te .TT 
buffer 114. TAD bu<, iifl io ^ ^"'^ reorder 

decoder lOrRes Jt b.f, ° '° '"^^^^'^^O" 

der buffer 114 Adlfr f " ^ reor- 

coupled to r^rde~^^^ 

108 and instruction cache ill ;!;^^^^^^^ 
and B operand buses 1lf? in^w . ^ * ""^ ^ 

Wide A operand buses Id fn ^ ' 

operandLses as vi , as ,o:;rr^r.' '^"''^ ^'^^ « 
wen as four parallel 1 2-bit wide A 
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tag buses, four parallel 12-bit wide B tag buses a 12- 

Sfle Atag valid bus a 12-bit wide B tag vaUd b.s 

four 4-bit wide destination tag buses and four 8-b.t 

wide opcode buses. Type and dispatch bus 118 

eludes four 3-bit wide type code "uses and one 4-b.t 

wide dispatch buses. Displacement and INLS bus 19 

Tnct des two 32-bit wide displacement buses and two 

8-bit wide INLS buses. 

,n addition to instruction cache 104 microp^^^^^^^^^ 

sor 100 also includes data cache 150 (DCACHE) and 
Physical tag circuit 162. Data cache 150 is coupled to 
SSore'functional unit 134 -^.^^ -^f,",^ 

with intraprocessor address and ^^fj'^^) bus 102^ 
Instruction cache 104 is also coupled w.th IAD bus 
102 Physical tag circuit 162 interacts w.th both -n- 
l°^ction'^che 104 and data <jche 150 v,a he IAD 
bus Instruction cache 104 and data cache 150 are 
both .nearly addressable caches. Instruction cache 
?M and date cache 150 are physically separate, how- 
ever, both caches are organized using the same ar- 

Mte^processor 100 also includes n;«'^^y."^f 
aqement unit (MMU) 164 and bus interface unit 160 
(IiuTt*^ 1 64 is coupled with the IAD bus and phys- 
ical translation circu. ^ 62. Bus int^ace unrt -^^^^^^ 
couoled to physical translation circuit 162, data cache 
1 50 and lA?) bus 1 02 as well as an external micropro- 
cessor bus siich as the 486 XL bus. 

Microprocessor 100 Executes cc^P"*^^ 
grams which include sequences •"Structions^O^n^ 
outer programs are typically stored on a hard disk 
noppy disk or other non-volatile storage media which 
are located in the computer system. When the pro- 
gram ^run. the program is loaded from the stc^age 
Sa into main memory 101. Once the instructions 
of the program and associated data are in main mem- 
ory 101. individual instructions are prepared for exe- 
cLuon and ultimately executed by microprocessor 

After being stored in main memory 1.01. "^e '"- 
structions are passed via bus interface unit 160 to in- 
struction cache 104. where the instructions are tem- 
porarily held, instruction decoder 108 retrieves the in- 
structions from instruction cache 104, the 
instructions and determines the appropriate action to 
Se For example, decoder 108 determine 

whether a particular '"^t^^^J'^" -^^f^^^^i^^^ NOP 
LOAD, STORE. AND. OR. EX OR, ADO. SUB NOP. 
JUMP JUMP on condition (BRANCH) or other n- 
suu^ion.. Depending on which Parti-lar instr^^^^^^^ 
that decoder 108 determines is present, the ins true 
tion is despatched to the appmpriate ^""^tional un.t 
of RISC core 110. LOADs and STORES are the P"^ 
nnary two instructions which are dispatched to load 
store section 134. Other instructions which are exe 
cuted by load/store functional unit 1 34 include PUSH 

^"'^ThTinstructions typically include multiple fields 



in the following format: OP CODE. OPERAND A^ OP- 
ERAND B and DESTINATION. For example, the in- 
struction ADD A B. C means add the contents of reg- 
ister A to the contents of register B and place the re- 
5 suit in register C. LOAD and STORE operations use 
a slightly different fom^at For example, the instruc- 
tion LOAD A. B. C means place data retneved f rom 
an address on the result bus. where A B and C repH 
resent address components which are located on the 
,0 A operand bus, the B operand bus and the displac^- 
nient bus. these address components are combined 
to provide a logical address which te combined with a 
segment base to provide the linear address from 
which the data is retrieved. Also for example, the in- 
,5 struction STORE A. B. C means store data in a toca- 
tTpointe^ to by an address, where A is the s ore 
data located on the A operand bus and B and C ref^ 
resent address components which are located on the 
B operand bus and the displacement bus. these ad- 
20 dress components are combined to form a logical ad- 
dress which is combined with a segment base to pro- 
vide the linear address to which the data is stored^ 

The OP CODES are provided from instruction de- 
coder 108 to the functional units of RISC core 110 via 
,5 opcode bus. Not only must the OP CODE for a par- 
ticular instruction be provided to the appropriate func- 
Tnal unit, but also the designated OPERANDS for 
the instruction must be retrieved and sent to the f unc- 
tional uniL If the value of a particular ope^^"^ ^'^^ . 
30 yet been calculated, then that value must be first cal 
culated and provided to the functional unit before the 
functional unit can execute the instruction. For exam- 
ple, if a current instruction is dependent on a pnor^HV 
struction. the result of the prior instruction nrjust.be 
35 determined before the current instruction can be ^xe- 
cuted This situation is refenred to as a dependency. 

The operands which are needed for a particular 
instruction to be executed by a functional unit are prc^ 
vided by either register file 112 or reorder buffer 114 
40 to the operand bus. The operand bus conveys the 

erands to the appropriate functional ^ 
functional unit receives the OP CODE, OPERAND A. 
and OPERAND B. the functional unit executes the in- 
struction and places the result on a result bus 140 
45 which is coupled to the outputs of all of the functional 
units and to reorder buffer 114. 

Reorder buffer 114 is managed as a first in first 
out (FIFO) device. When an instruction is decoded by 
instruction decoder 108, a corresponding entry is al- 
so located in reorder buffer 114. The result value com- 
puted by the instruction is then-written into the allo- 
cated entry when the execution of the '"^truct^n is 
completed. The result value is subsequently wntten 
intoregisterfile112and the instruction retired If there 
55 are no exceptions associated with the instruction and 
ff no speculative branch is pending which affects the 
instruction. If the instruction is not complete when i s 
associated entry reaches the head of the reorder buf- 
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tion Circuit stores the mstruction s op c^ort^^r' 
with tags which reserve places for L ^ ^^' 
ands that will arrive Z t^T ""^^'"^ 
later Thil V \f ^ reservation station circuit 

mUt n J ^ ""'"^ ""'""'^^^ performance by ^- 
mitting microprocessor 100 to contin..« ^ 

Microprocessor 1 00 affords out nf r,r<^^, • 
isolating decoder loa f.v>mT T 
RISCcore 11^ M functional units of 

ZdthT^. ^ specifically, reorder buffer 114 

strucona. The ins.ru« J«^„ZlS 0"^^^ 
'OO a loo* ahead oaiwy ^1 

when a brS' h '"''""^ Performance. Because 

includes reservation station circuit 124 storr^rff^ 
circuit 180 and load store controller 182 rTJ J^' 
station Circuit 124 includes fo^ relerva^to„ 2''°" 
entries (RSO - RS3) and store buffer 0" "; 1I0 
eludes four, store buffer entries (SBO SB3 



10 



Reservation station cirmif -lo/i u 
fields Which are required t"pe fj^^^a,' °' 
or a store operation. Data elemTn^i°K 
two reservation station entries per doi^ '^T"" '° 
can be retired from two resTrvatl tt J- "^^^ '"^ 

perdockcyde-ReservatiCstaCcirS^^^^^^^^^ 
pled to the four result buses 4oSL o^^! r ! 
Aoperand buses. 32 bits of ihe four B 41 J^^ 
buses, the A and B tag valid busls tfe fo^^A^oT"' 

buses and the two^wt^h ° displacement 

b« data poriionfof J^rts aT^b ofT" 
Reservation station^^Vt T?/- ^ "^^^^ 
buffer circuit 180^a?SL f *° 
reservation statLTtat s'^R^^^^^ 
spectively). a 12-bit Atag buffTAG a^^ w^o^' 
tag bus (TAG B) as well ^ ^^2 5^1.?, ' V ^ 
(ADDR A. ADDR R\- th^T ^^-bit address buses 

Lpled to^e add™^^^^^^^^ ''"-^ - also 
data cache 150. R^^'^^Tt^Zt'^'"' ^ °' 

to IAD bus 102 ® a'®° *=°"P'ed 

ADDR A H ; r • a" address portion 

DATA B a^f ^ '"'^'"^^^ ^ ^^^^ Portfon 

DATA B. and an address portion. ADDR B 
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^at^, <itore array 200 and linear tag and status 
^ -a, 200 provides .«.. <.a^ 

tag hit signals, COL HliAU j>. v. 
cache array 200. Linear addresses ADDR A ana 
InoR sTre also provided to data store array 200 
During a load operation, reservation staf on c.r- 
J load store functional unit 134 provides =n 

array 200 to reservation station crCJit 124. If 
^hfSdrSJas provided to data cache 150 v,a port 
r,h«fft?da?is provided to reservation station cr- 
A. then the data J P j ^ address was pro- 

cu«124viaportA;alterriatelyj"ne 

vided to data cache 150 via port ^'^l^""""^ 

p„vided to '^■'^^'^^^'■'^X^ 

trir^rrrsr^-VAaodportB 

ntZ"a Le operat»n. store data is provid- 
ed ,™m7i««ion ^'1°" ciro.it 124 to store buffer 

^'^ri::x:sLSoe.2osrec.i^ea^^^^^^^ 

four Aoperandt«Jses. the fourBoperandt>uses the 
A^nd B tag valid buses, the four AU9 buses, me our 
eT^ buses, the four desUnation tag bus^. *e f^J 
Z,, h..^as the tv«o INLS bUses and the two ois- 

SSst::r=s^= 

182 The bus select signals are generated based 

oneof^h^fourtype^^e-™ 

type code -^-^^^l^tTs^^reoonUo^^^^^^^ 
""'"'.ira b^s sdect signal indicating from which 

buTse^ct s^nals are generated by load store con- 
Uc^,ert82for'input0n.ulUp.exer206 and inputl mul- 

^'^'^^^t::;:^^:^nrstsetofbusselectsignals 



multiplexer circuit 206 provides a first mult.p exed 
rsiwSon station input signal (INPUT 0) wh.ch ,s 
;Sd as an input signal to the reservat.on sta- 
tions The INPUT 0 signal includes a signal from one 
, o?the loperand buses, a signal from one of the B op- 
' erlnd buses, a tag from one of the A tag buses jag 
:rd bits corresponding to the A tag ^^0^ the coae. 
sponding tag valid bus. a tag from ^y^e 
hiicies taq valid bits corresponding to the B tag from 
,0 ?he corresponding tag valid bus. a c^estinaUon ^g 
Jom one of the destination tag buses, a opcode from 
onTof the opcode buses and a displacement from 
one o the displacement buses. Under control of the 
seLnd set of bus select signals, multiplexer c.rcu.t 
« ■ zoHrovides a second multiplexed reservation sta- 
" fon Sn^ (INPUT 1) which is ^-i^;/;- ^^^^^^^ 
inout signal to the reservation stations. The "NPU i i 
^iSnL Tdudes a signal from one of the A operand 
rs.rsign.,;<^<.e.«»^^^^^ 

- ?^rg°:cr.-™pondihg.^^^^^^ 

b^a tag from one of the B tag buses, tag «irf bits 
^ponding to the B tag from the "^^^ 

rr^TZ^^ .rom one o, the d^plac 

™"B^4°vation station entnes 210 - 213 each re- ,. 
ceive;Tt:oi„putsign*.iNPUT0..NPUT1.^^^^^^ . 

3. ail^ as well as '^^Zl'^T^T.ZlX"- 
r'°:;r."ri>Stbies;theseresultbusin- 
ifptted" the A and B operand portions p. 
*e ertry o"ly. Information is retrieved from these ,e_ 

:Ssa«fo^r.S:«>onthatisononeonhedes. 
Unation tag buses, then information from the coir^ 
sS^dJtg result bus is retrieved and loaded mto the A 
« ore^nd'f ieid of the reservation ion enU,^3„ 
Additionally, reservation station 

calves a --^'^-^J^ Z ;i"y 
;rprdl"r^:Srs'oreservationstation 
X Ao^rand portion) to store buffer arcu,t 
" 'ra^te -.D^TA A s^n^ and P-"-- 

RSO reservation station entry to RO adder 21 6. BO 
adi 21 6 uses this reservation station entry to gen- 
the ADDR A signal. Reservation station entry 
RsMii^^ a reservation station entry from reser- 

vat on s^"' ^ "^'^ ^ 

r^^trT;rAr^:j°p:^^^»=-* 

rtrasTe^DUBs^nalandP^vides.^^^^^^^^ 

- ^rrreriTsrtrrrrors^trettry^ 

1™,. me ADDR B signal. Reservation station 
?tS2Te^^es a^iervatloS station entry from reser- 
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vation stations RsTaTdlS'o R"^ '° 
RS3 provides the RsTl Reservation station 

-erv^ationt^ts^'SaTn:^^^^ ^^^^ to 

More specif icaliru^na th! .^T'"°"" 
controller 182 contSeToal ^'S^^'^' 
ervation station 6^11^.0 2. ^"''"'''■^""9 °f res- 
ervation Stat on emrfet I k '^'^ 
any given cyde «'^'«ed in 

v..esa''resera:;onsl^^^^^^^^ 
216 for both foad and ^tr.r^ ° °"^">i 

for a sto. opt^VoratT^rr^hir^^^^^^^ 
station entry to store buffer ifln o reservation 
RSI providi a reserv^I If ^^^^'"'^^'^^ station 
tion station RSO^^ ^^^^^^^^ -erva- 
reservation station entrv ° ^^2 provides a 

and reservation sfei^on RSsT^^^^^^^^ '^^^ 
station entry to res^v^tfof sStsr^T" 
operation, the data correspond^g to thf Ld ' 
erated by RSO adder circuit 21 fil ! '^^^ 9^"' 
circuit 220. ^ provided to driver 

exeJ^;er;e~rn" ''-'"^ 
and RSt Provirrirec^i "!r''"" ''^"""^ ''^^ 
tries to addercircu fs ir^tfiT ex- 
onerations. Reser:t.^r;^rsTs2 ^ 

the data con^sponrgTtre ^^^^ - 
t'y the RSO and RS-i fw? ^^^^resses generated 

DATAAand DATA If rem P^°^'^«<^ as 

reservations^'i^nenrsa'r^^^^^^^^ 
cle. and one operation 1!! 5 ^ executed per cy- 

tion is a sto'e then^h^ ^"^ "'^ other opera- 

Which the s^^^fope^a^^^^^^^^^^ ^-m 

v.ded to store bX i80 ' P^o- 

a..o:;rn?:s;reZ^^^^^^^ 

is speculative. the ^Too J " ^^'^ « 

the load is the nextS to'S '"'"T^ 
load holds in the oonl ^"!' because of this, the 

forthereleases.gna?fromT'^^°^ '^-''^ 
indication along XhTde '' T'"'""''"''- 

124Sra"re?er:ar^^^^^^^^ 

a ^0-bit AoperaS?32 Sr 
bit displacement f ie^; 4 1 k '"''"^ ^ 
f ield. an 8-bit opcode fielit^d rn r^^^^^^^^ ^'"'"^^^ 
code information (INLsffietd ahw ; * ^ 
e-ion station eitryalVr^r^^^^^^^^ 
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hr^^AC^n^a^a^^r^----- 
tag (ATAGL) a Jbft B n I ''^'e 
GU). an 4-bi B opeldrdd. 
an4-bitBoperand,rer,'^f^^^^^ 
the corresponding A and B n^^ ^ 
Each reservation statinn . '^"'^ ^^"^ bits, 
sponding canlrbft (c^ ' '"'^'^'^^ ^ 

aretIgs%:rlTermrd:;X^"'''^^^^ 
teger operand The intnL ' """^'^"^ 
waybe,iuse. under he x86r:r' '"'"'^ '^'^ 
ble to reference ertherthff « Possi- 

lower half-word fhe fow^ °' ""^^ °f ^he 
bit double wo^Jof an Xinf^^^^^ °'' 
and L refer to the upper and .f' ""T"'''"^'^' ^ 
half wo«i and the i "Sfers to hr' °' '^'^^r 
a B operand and to rim? """^^ '^^'^"^"'^ 
operand (because ti^e rim?'"'"' "'"'^^ ''''^ an A 
emnd maV be eXr ST?^^^^^^^^^ ^ 
encing the lo«.er half wo^ the f an ' M T'" 
to the same value All thr^^ il ^re set 

value When refe^en^a astf T *° ^^-"^ 
ing in the reservatio:s'tation e^^^^^^^^ "'^''^^ 

is withinamisp^S^tedta;^ r!* ^"^ ^P'^-^e 
led In order to pieveTcan "S'd 
cache 150 from enter^'g^^^^ .tfSr^ 
stores that are executed 1 ""'"^'^ as " 

which is stored tndTlTTL'^^ ''^'^ °f ^" entry - 
cellediustreTurrhereru^'^^^^^^^^ Loads that are can- 
cache 150 and thus Se nnT ".?^''" •^^'^ 
does not update any s^te ''^"''"""^ ^ '-'^ 

erva«o: Sarntr I:"" ^^"^ °^ res- 
bit portion of rer;;;TaTrUT^^ 
Each input signal valid bit ^h/Jk '"Put signals. 

displacement field of the ri^^r V The 
coupled to the disp^c^mln? ! '°" ^"'^y is 

and INPUT 1 input 1^1 "Ir^''^" °^ '^^^T 0 
Of thereservarnsSnt^ '-Id 
tination tag PoZ olI^^J^S^; oTT.mo*° ^"^^ 
signals. The opcode fleW oMhl "''^^^ ' 
entry is coupled to 'he oDcodl 'T'''^'''^" 
O and INPUT 1 ino. . °f^^^ 'NPUT 

information ([nUS^^^^ 

try is coupled to the r^LS '^^^^^^ '^''on en- 
'-PUTi,p,3ig;3,r^^~^^ 

andlowe^bSofr" "^'^<"e byte tag. 



15 



EP 0 651 323 A1 



16 



portion of tne iiNr w • reservat on sta- 

A and B operand '^^^'^^^^^^^^^ of the 

tion entry are coupled to tag v^' P 

'''"fhewpe match signals v.hich are generated by 
The type nnaicn y vvhether any m- 

,oad store -"^ ^J^^'^^^^^^ to the load store 

r^nnal unU More specif ically, when load store 
functional unit, wore »; , ^store function- 

''"""''»htf<^™^WsVs.'healoa<.storecon.rol- 

1 signal. . RS0addercircuit216 receives 

^"''^"'ronentsS reservation station 210 
address components ' .„„alADDRAaswell 
and providestherinearaddresss,gn^M)U ^^^^ 

as a valid segment access ^^qY^^ 
,,ein.ud.,ogi<..a^^^^^ 

S^Jt^^Larad^^a^ 

address adder ^'l^^^;;^^;^,^^^^^^ ad- 
nat from A operand '^''^'ifj'^ f^^^^^^ 246 and a dis- 

operand from rese vat ^^uiplexed and 

' rdTthe A operand addersignal is determined 
provided as the A ope j fg^^ation which is re- 

by the address mode <^^"^^'J^';^^ ^ ^pe^nd mul- 
ceivedfromloadsto.^^^^^^^^^ 

tiplexer js scaled based upon 

shift circuit 247. The B op^^^^^^^^^ 

the scale ^^^''^;^'^^l^^^rLpers^<i multiplexer 

is multiplexed and P^f ^J^^^,^^^^^^^^^^ control in- 
signal is determined by ^j^J^^^J^ re- 
formation. Displacement muMexerc 

eeives the <^f^^':::;^^'^Tois^:Zeu. multi- 
reservation station entry 2 0^^ J ^^^^^.^.^^ ^^^^ 

plexer -'^^^ /f^^^^^ two; the value which is 
five, minus four ana nmiu-. 



™,l.lole><e<l and provided as the displacement adder 
^i^Sfs Stennined b, «,e address ™de con.ro, ,n- 

'°™Fran aligned access load operation, the A op- 
„„d L seie«ed b, multiplexer 2«. the B operand 

= rs"rti%— ^^VoT^c^traro: 
rrr'a'Lrs'S^-word 

S::ro'adr==E 

Si^Tnr:«.ocKo,...hev.j«0.s.e^ed 

" ^ rjCt SeSed b, mt^tiplexe. 248. thus 
'^ J^ViX. adder the ,uantity4 tothe mis- 
causing =''<^"" " ^ For a multiple ROP oper- 

Tn e:°?S-'.«'-""p=«''--'^^ 
ation,e.g.,aoH-u.i.i r ,,„_j„__«t,onandad- 

,0 dress is generated as a ^^^^^^^Iss. Start 

— SHSrseC^cT^ 

25 tiplexer248 the va ^3,^^^ 5 ^ pro- 

" rjr^rbr^^xTa 

access size '"^e ^^e^^^ ^ -.^ subtracted; .if the ac, 
doubleword. then the vaiue . ^ted. The 

""-^'J^TrS^* 216 *o ^•5"'"" ''T.'" 

»^^-^crai:;s^^^^^ 

- ^ner4-S2andas^^^^ 

=^"nr.ro:iSi ad^r..^»9^' r 

receives the . g^cess signalindicahng 

and P-'<l«^^V/jt3Si^nTesegment.imits as 

r^rt^^thel^^^^^^^^^^ 

^""^l^de^ circuit 240 receives the A operand adder 
• fthl e operand adder signal and the displace- 
signal. the B oper^^° j ,5 ^ provide 

so ment adder signal f ^^^'^^^j, 242 adds the 

^ '^^'It^iraddX w" 
S de?X^--y 250. to the logical address to 

^niltrZ^sTsf-nar to RSO adder with the 
" ?h«t RSI adder 218 does not include mul- 

TT"248 beiuse unaligned accesses are only 
RSO reservation station. In RSI 
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a"y. because unaliqnedTl A'^'^'t'on- 
the values 4 and 5 ^'^''^'^218 is notprovided with 

303 as well as A oort m ' and SB3 

-erge circuit 308 A port?' « ^^t 

the A port data sLal frorn 

Portdatas^nalfSlTv^iV;^^^^^^^ - 
reservation station circuit ^^Tlll ^"^'^ °' 
nals to provide a meraed A ""^'^^^ '^^^^ ^ig- 

tf^e 8 port data slgna, ZlZ?" ^^'^-e^ 

Portdatasignalfrrmlrva^^^^^^^^^ 
reservation station circuft l la! ! ^"^'y "^^^ °^ 
nals to provide a merged e f^,!" """'^^^ '""^ ='9- 
entries SBO - SB3 Bv or 5 *° ^^^^-^ buffer 

308. a steenngTuncfr^Z^^^^^^^ ^'^'^ 

vation station circui"^ ?hT, r^''''^' 
merged With the thrie ilL T^'^ "^^^ 
Asignal which is p^reTrr"*'" '^"^ ^^-^A 

Circuits 206. 308 arT^ntrolfed H ""^^^ 
'er 182 based upon thTac!"f ^.^ ^"trol- 
two bits Of the linear addrS T' ^'gn'^cant 

a misaligned a^eL ;!? 
The steering functton whlh ^ '""^"^"^'^ 2. 
-«s 306. 3?8 is S'ibt e<S,rsf ' '^"^^^^ - 
read modify write operations rT T ^« 
ing function, data cache 7 '^'"^"^'"^ ^'^ ^^^r- 
P'ex steering circuCe<L'l'°^^^^^^^ 
cache 150 are 32 hitL l,^^ ^" accesses to data 

any. -'^ateveH^'n: r ^^^^^^^^^^^ - 
a reflection of what wi hi . «"»^y is 

thus allowing load Jto^" f uno ° . ""^"^ '^^^"^ "'SO. 
a load forwarding oTe,^«on '°"^ ""'""^ '° P^^-'de 
eration. loads mav bP n« ^ forwarding op- 

. »-,lystoredinTaLt?h?S'^ - 
buffer entries: load for jr'f°',^^«^^'ng ^ 

t'ons from the microprT^^trV /^^^^^^^ "P^-" 

Each store buffer emrv a,,o ^^' *""'"9Path. 
"als from the four ZZ Lt Z^^"^' '"^ ^'9- 
ADDR B address signT LI '"^ ^ 
ar^d the TAG A and B tToTT^" ^^4 
tion station124 as 1^7. ^ ^^"^'^ ^«serva- 
storecontroller 1 2 Th se ^''^"^'^ 'oad 

'oad Signals and the sWf?,t? "^"^'^ '"'^'"''e the 
hufferentrySBO e^vTirr?-^^*^'^^^^ 
entry SB1 and proSs a ste?."' '^"^ '""'^ '"'"^^ 
Store buffer entry SB 1r?.? ''""° ''"^ ^02. 

output from store^bulVrrrira^ 
a store buffer entry outouf f ! "^^'^^s 
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vte^iTor^ri:^ 

SB3 receives a storr^u^f:" ent '° ^^'^ ^'"--^ ^-^'^^ 

bufferentriesSBO.SBlanTsBTi' T'""' '^"^ 
buffer entry outpui to sS "'^^'''^^ ^ ^'o-'e 

^^e&rr:,:~-sBi-sB3with 

tries, a store forwa^i nropl'^H^ '"^'^^ 
ample. thestorebuffererysS^'^P^^ 
s-gnificant store buffer eZTlL^'°''''^^'^ "^^e 
these store buffer enTries to cL5' ' '° ^llow 
buffer entry with the more "ionTf 
the entries have the samel? ^ben 
'y. When the store bS J!^'^': According- 

V.OUS store is stored in date^!h . f ^ P''^- 

the X86 architecture ast^f ^ S«<=ause. fn 
"t-e byte accesSSr '?"^""'"'"^°^«'"««<^- 
cantly increases the speed win ??^^**'"9 signifi- 
formed by removing the depend^ P^^" 
ation on a store ope^Ifof '""''""'^^ °' ^ 'oad oper- 

Sei^S-^orebufterentrySBO- 
«on set forth in storeTffe 1 ' '"^^^'"a- 
entry 339 includes 32 bTdat^t^?"' ''"^^^^ 
portion 341. 32-bitlinearaddl^^^^^^^^ ^9 
formationportion344 DatadnnJ " 

^ourdata bytes, data b^o'ttabrr^"^"^^^ 
Tag portion 341 include, r J 

Which correspond to CbtesT 3 ^vT 
t.on mdudes a byte 0 tag (TAG B^T^p nf^^ ° por- 
trol bit (BO) and a byte 0 tea ^fll ^ ''y*^ » ^on- 
Portion -dudesa b^i:?5 "^G ^^^r^f^^ ' 
control bit (B1) and a h^/f« / ^ byte 1 

tag PortionUTel a b^e 2 ir^In'' 2 
2 control bits (BO 81) and . it? ^ ^^"^ 
Byte 3 tag poiion fnldes a ' "^"'^ ''^ 

3). byte 3 control bits (Bo b.) .^/ ^^""'^ 
bit (TV). * °' and a byte 3 tag valid 

The byte tags TAG BYTE 0 -^n.-, 
tneve data bytes 0 fr *ags to re- 

control bits indicat^ fr« T ""^^s- The byte 

byte should ^'r^l'^J^d"^^^^^^^^ 
byte Ocontrol bitBO is set iti^d .""^'"'^''^"y- ^""^^ 
be forwarded from a resu, h ^'^'^ ^f^ould 

trol bit BO is cleaned hen dalL' T ''''^^ ^ <^"- 
from a result bus b^e 0 w^- '^^^^^''^'^ 
is set. it indicates that dataTho^ld^! f 
a result bus byte 0- if hv^l . ® forwarded from 

tben data should b; forwarded 

1- When byte 2 contS^re. '"'""''^^"'^''^^byte ' 
data Should be forwardld f rfm " " '""'^^'^^ '''at 
When byte 2 cont^l^Bo ™" f ^ ^"^ 

should be forwarded from a ri k ''^'^^ '^^'a 
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'n:::^ieb:«e,'..9S indicate .ha. acu^ byte !»■ 

S and RS1 vvait until valid data Is received by the 
Teservln station and then the da^ Is prov.ded as 
wo store buffer entries to the store buffer, 
^ntrol portion 344 includes a store bu«^^^ 
r.H hit A/^ a 2-bit unaligned access control signal 
m^awritt protect bit (WP).anon-cachablestoreb. 

NSan2put/Outputaa.essbit(.0).af.oat.^^^^^ 

Srt cLti entry is valid, i.e.. that t^jere 'S son.e v^^^^^ 
■ ^, the f irst portion or the second Port;on. of an^un 

ab..a„dacc«,*n9,,-,.^^^^^^ 

r.rerna . - Z an „0 access is occr- 
^ Thfohvtcai access bi. indlca.es .ha. .be mer^ 
'r manageme^ uni. should b,pass O'-J^^J" 

?:S:arrru,:iin,e«be..bepa^..ec^^^^ 

daui ceche (ha. is being =°=°Ce i*e1M 

no need .o perform a colun-n lookw ». daa cache 150 

"";Snt.;'Frrsrbren.,,ci,c„i.sBa 

aoafsthZi an exannpieof each store bu«e,e*, 
:;l;i%,«reb.»«e.r,*^i.^^^^^^^^ 

^ Hate bvtes 0 - 3 of store buffer entry jjm, 
i:fe buf>e? entry'^^g multiplexer 370. which corre- 
aoo the taqs of store buffer entry 339 and store 
Sfer intfy d^ss multiplexer 372. which cojr^ 
s0onL to h'e address of store buffer entry mult.plex- 



er 339. Store buffer entry circuit 302 also ."eludes tag 
compare circuit 374 and address compare c.rcu.t 376 
s°ore buffer entry register 360 includes store buffer 
dlta entry register 380. store buffer address entry 
, JegLer 3L. store buffer tag entry register 384 and 
store buffer control entry register 386. 

Store buffer entry register circuit 360 is a regis e 
which receives a store buffer entry 339 m paraHel 
Trom store buffer entry data byte -^"'''P f.^^^f ^ - 
„ 36^ tag multiplexer 370 and address multiplexe 372 
and provTdes a store buffer entry 339 parallel to 
store buffer entry circuits SB1 and SB3. AdditionaU^ 
sSrbuffer data entry register 380 provides data 
Zes 0 - 3 to data port Aand data port B of reservation 
sSuon mixer circuit 220. These data bytes are provid- 
ed to alTdw load store functional unit 134 to perforce 
a load forwarding operation 

Bvte multiplexer arcuits 362 - 365 receive re- 
spe^^e bytes from Amerge circuit 306. B merge cir- 
t^na and the four result buses as well as from 
'° !tore buffed enuy circuits SB3. SBO and SB1. Byte 
TuCeler circuL 362 - 365 are contrcUed ty s^re 
buffer control signals which are provided by load 
buffer comro' '» y ,(^33^ addresses 

'*°'^'^K^lbufSrT^^^^ 
" ^ddTsses"™^^^^^^ the reservation station 

fhe result buses are controlled by store buffer contro 
IlTnSs which are provided by toad store controller . 

1 82 based upon whether a tag valid bit is present for 
182 baseo up _ ^^^^^^ ^ 

b!Sdmu..iple.es»hicbeve,.es*bushasavai. 

ue which matches the tag. . 
For example, byte multiplexer circuit 362 .re 
• ^ht hvte 0 data from each of the Amerge sig- 

" rarBmegeSgnauL 

nal. B merge sig ^^^^ ^^^^^ 

■M ^ ^n*. of each of these data bytes as the SB2 store 
. wh^^^^^^^ store buffer register cir- 

lach byte which is stored in store buffer data reg- 
ister 380 is a direct reflectton of what is stored in men. 

' V accordingly, byte steering is provided to arrange 

. ZdatCes to correspond to whafis stored in mem- 

alTeering is provided by providing byte mul- 

362 and byte multiplexer 1 363 with inputs 
t,plexer 0 362 ^Vje P ^^^^ 

from the four result Dus oyie 
hTbvte 1 in parallel, providing byte multiplexer 2 364 

r'k-rsrvTrrrrbTrrn 
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tiftn t OA , ^ ^'9"^' reservation sta 

dres.p„„,<,„ 3420,, J'";-;^- ad- 
compare arcuit 37? whi,-h =t ' address 

and ADDR etlate fr^^ *° ""^'"'^ AO"" * 
dress "".^"^'^ '«=«™alfonslatlon 124 Ad. 

ADDR B Signals to linear eddresa 342 every do* 

a*d,'Lr:drsTr2 rr;"''''*"'^'^^^ 

Tag multiplexer 370 receive., thl f , 
buffer entries SBO. SB1 and s^^ f ^ *^^f 

arso recedes tagsVro:;"n?nJ;^T~ 

suit buses The taas f rnm fh ^ ^^'^ ^^^^ 

by tag cental ciSSTr? '^"'^^^'"°"'*°^«^ 
register 384 ma S ^ 1? '^"'^ 
es. then tag contr? ciSt ^4°^ 
Plexers 362 -Jr? ^> ^ ^^"''"'s ''V^e multi- 

ing store buffer data regls't «>rrespond- 

p-ovSr; ST^roon'tr s:?"' r 

control register 386. *° ^"^^e-- 

Store buffer entry circuits SBO, SB1 and <?R^ 

^^e7r.r:r:s:;:-T-"-" 

buffer entry SBO r.^^?. ^ ° specifically, store 

bufferen.,;!! °r.7r'e: risr r 
burrs -'-™«ordTBT7. : 
a.o,?;u°siBo,'si:rdSBr*'" 

usin^ the st^r^'yi Is tnl:! °P"="<"'»- By 
tlons need not nec^ssSJ^f ""^ 
Additionally by u^o S?I « 

tion with IhLlZ S ""'^ '=8' <" '""Wno- 

ally b«S^ forwarding operations. Addition. 

upinsS „p\ra^ons°Sr' *>'^"''«"' 
indatacaohe%67s,«bJ^, 
forwarding operations """""^ "^^ 

valid lnthetfe?va?tan sS:;, e','"''' "^^ ° " ^ 
.he respect^e tag v"d brAnTpdl'"''''"^^.'^^ 
wbenafunctionalunitisgolngtrb^^ r.;eTr 
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ctrpr:re"s"rchT^: °r -'^e 
fersto^torfbuffe"d,e!.i 8?^^^^^ 
entry RSO. TheXoDe^nw ' ^'^^^'^^^ 
^ the reservatton slatfnn o 1 ^TAGU of 

and byte in h "store t ''^^ ^ 

and ATAGM reserlaL? f f k ^^^GL 
ed as the sto^^S r^^^^^^^^ ^ P-'^" 

tiveiy. (,n the case of a dlbie:'^!^^^^^^^^^^ 
'0 tags are actually the sam« \ m-. / ' ' °' '^^^^ 
bits BO and B1 arfset V^I^h °' 

abiebythefuncZafuSo^rrr^^^ 
eacli byte tag against the J.. compares 

buses, usin/tag c^a™ ^37:3" 
« lipiexers 362 - 365 aa^.i^T. . ^"^ "^"9 "^"1- 

lag. In the case oi , a match on a 
matches l^iJnSir"'™"' 

date a byte o, .^eiard^rr^ ."°■ 
'or byte I.Cs differentln " """"" '^^ 
ond byte store has oc^^^jT'"'' 
first doubleword is slS^7^' . 'Pacifically, the 
With four vairC an^The'b J 

placed in byte 1 whii.Vk . *^ """'^"'""ag 
a™ forwaiSSfmm SBO A^'."'"" "'^ ^ ' ~ 

ing achie,:^':h7*"^,tSmeT't:'r '°r ■ - 
^ng rbroo':tr^"Sr '^'^ - 

" upda,:',: bTlTandl' '^'^'='«ha pending 

which coTOSDonri^ to . bus 

register 3rSrpecttIJ;VH^ ''"'^^^ 
to the case wh^e fhtrll ' ^PP"«« 
to a word J^STs stX/fn a ^^^^^^^^^^^ 
^5 two bytes in the storrhMfr "^^^^ ^he 

.re£.^^:~;-;-;^^^ 

ta. ^^^i^^S^S^Z'-'-^'^ ^ 
er the source byte is a hiah ^ ."^ 
^0 this tag matches rt grsTa^'JromrJ^^^^^^ 

Of the result bus jf^ZrtflZ ^ '""^^'^^ ^^'^ 
a registerthat has rioH " ''y*^ ^tore of 

date In this cLse thrn^'"^ °' doubleword up- 
sponding lo'caTn o ;e^r:;:,r^^^^^^^^ °" 
entire bus may contaL vaHd dl^ ' 

e-romalesssiJ::S-:^bXt:e^ 
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that from data cache 1 50. As a result, store buffer 1 80 
inserts a tag into a data word that already has a tag 
in it This can happen, for example, when more than 
one byte is written into the same doubleword within a 
short span of time. Accordingly, the information that 
is stored in the store buffer entry can have more than 
one tag each representing a different result In oper- 
ation, each tag compares against the result buses 
and gates in the proper byte at the proper time. Be- 
cause unaligned stores are not allowed to write tags 
into store buffer 180. awkward forwarding cases do 
not occur. 

When performing a load operation, address com- 
pare circuit 376 of the store buffer 180 compares the 
linear address which is provided by RSO and RSI ad- 
ders to the linear addresses of the store buffer en- 
tries If there is a match between the load address and 
the address stored in one of the store buffer entnes 
as indicated by the hit signal that address compare 
circuit 376 provides, then load store controller 182 de- 
termines the load is dependent on the store. If the 
load is dependent on the store, the data from the store 
buffer entry which provided the linear address match 
Is provided via whichever port the address matched 
was provided. This operation is referred to as a load 
forwarding operation, ... 

Refening to Fig. 9. data cache 150 is a linearly 
addressed cache. Co-filed application 

(Attorney Reference PCSnT0272/SMP) 
based on US application 08/146.381. which is incor- 
porated by reference, sets forth the structure and op- 
eration of the linear addressing aspects of data cache 
150 in greater detail. 

An entry 400 of data cache 150 is shown. For 
each entry of data cache 150, the middle order bits 
of each linear address corresponding to the cache en- 
try provide a cache index which is used to address the 
linear tag arrays and retrieve an entry from each lin- 
ear tag array. The upper order bits of each linear ad- 
dress are compared to the linear data tags stored 
within the retrieved entries from address tag array 
310. The lowest order bits of each linear address pro- 
vide an offset into the retrieved entry to find the ac- 
tual byte addressed by the linear address. Because 
data cache 150 is always accessed in 32-bit words, 
these lowest order bits are not used when accessing 

data cache 150. . . ^ 

Data cache entry 400 of data cache 1 50 includes 
linear address tag entry 402 and data entry 404. Data 
entry 404 includes a sixteen byte (DBYTEO - DBYTE 
15) block of data. Data linear address tag entry 402 
includes a data linear tag value (DTAG), linear tag val- 
id bit (TV), and valid physical translation bit (P). The 
data linear tag value, which corresponds to the upper 
21 bits of the linear address, indicates the linear block 
frame address of a block which is stored in the con-e- 
sponding store array entry. The linear tag valid bit in- 
dicates whether or not the linear tag is valid. The valid 



physical translation bit indicates whether or not an en- 
try provides a successful physical tag hit as dis- 
cussed below. 

Refening to Fig. 10, data cache linear tag circuit 
5 202 and data cache store array 200 of linearly ad- 
dressable data cache 1 50 are shown. Data cache 1 50 
is arranged in four 2-Kbyte columns, column 0. col- 
umn 1, column 2, and column 3. Data linear tag circuit 
202 simultaneously receives the two linear addresses 
to ADDR A, ADDR B and data store array 31 2 simulta- 
neously provides the two data signals DATA A, DATA 
B. i.e.. data cache 150 functions as a dually accessed 
data cache. 

Data store an-ay 200 includes four separate data 
,5 store arrays, column 0 store an-ay 430, column 1 store 
array 431. column 2 store array 432, and column 3 
store anay 433 as well as multiplexer (MUX) circuit 
440 Multiplexer 440 receives control signals from 
data linear tag circuit 202 which indicate whether 
20 there is a match to a linear tag value stored in a re- 
spective linear tag array. Multiplexer 440 receives the 
data from store arrays 430 - 433 and provides this 
data to load store functional unit 1 34. 

Unear tag circuit 202 includes linear tag arrays 
25 450 - 453 corresponding to columns 0 - 3. Each linear 
tag array is coupled with a corresponding compare 
circuit 454 - 457. Accordingly each column of data 
cache 150 includes a store an^ay. a linear tag array 
and a compare circuit Store arrays 430 - 433. ad- 
30 dress tag an^ys 450 - 453, and compare circuits 454 

- 457 all receive the linear addresses, ADDR A, ADDR 
B from load store section 134. - 

IAD bus 102 is coupled to each store an-ay 430 - 
433 via store address multiplexer 460 to provide both 
35 a store address. IAD bus 1 02 is also coupled to store 
register 460 which is coupled to each store array 430 

- 433 The store address, which is provided by IAD 
bus 102 is provided to index a particular column and 
to select a particular bank; the particular column is se- 

40 lected by column select bits, which are provided eith- 
er by store buffer 1 80 when performing a store or by 
physical tag circuit 1 62 when performing a reload. For 
a store only one bank is accessed. The bank select 
bits, bits 2 and 3 of the address which is provided by 
45 IAD bus 102, are used to access the bank. For a re- 
load, all four banks are accessed in parallel. 

IAD bus 102 is used during both store operations 
and reload operations to write data to store an-ays 430 
- 433 of data cache 1 50. When performing a store op- 
50 eratton, data is written in store arrays 430 - 433. via 
store register 460. in 32-bit doublewords. For a store 
buffer write, the IAD bus address, which is provided 
to the ADDR B input to data cache 150. ADDR B and 
IAD address are multiplexed by address multiplexer 
461 

^ When perfomiing a reload operation, data is writ- 

ten into store arrays 430 - 433 in 128-bit lines. Store 
register 460 collects 128 bits of data from IAD bus 
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102 in two 64-bit accesses; after the 128 bits is col 
lected store register 460 writes this data to stored 

Xste'addr'" v"'°^'' -^'-er^rt • 
Plexes the address hnes of IAD bus 102 to receive 

data, because 64-bits are written in each phase Ad 5 

dress multiplexer 461 multiplexes the IAD address 

onto the ADDR B address path to index into the ^ows 

Data cache store multiplexer 460 is controlled by d^ta 

cache controller 190 based upon whether a storlt 

a load operation is being performed. For a reload od 

drelr- eloal ad-' 
dress va port A of data cache 150; accordingly date 
cache 50 uses ADDR A for a reload add^ss 
H.f. T"? *° ^"^ ^2. each store array of 

in '"-'"P'e accesses .5 

LsiSiatd .t;'? -ithoutrequiring the overhead 
associated with dual porting. More specifically each 

store a 32-b.t double word of data; each bank includes 

a respective bank address multiplexer 474- 477 Th^ ,0 
comb,nat.on of the four banks provides access to a 
single line of data cache 1 50. 

« »h '^^^ ■ '^^^ individually addressed bv 

either ADDR Aor ADDR 8. which address is pSd 

bJJ^T""'^ ^""^"^ '""'«P'«-- 474 - 477 25 
thTh. J^T'""'*''''^^"''^^^'^-^^7^^«~ntrolIedby 
the bank select bits of the ADDR A and ADDR BbZ 
cause each bank is individually addressed, more that 
one bank may be accessed simultaneously 

«HHr?''*''^?'"^'^^^^®"'"'''9- ■'^•vvhenADDRA 30 
addresses a line of bank 0 and ADDR B addressesVhe 

ADrRMo^l'^"'/';'^^" -Itip,exer 474 i'u 
causes aodrrTk' '° ° -"ultiplexer 477 
causes ADDR B to be provided to bank 2 The data 
word Which is addressed by ADDR A is pro^de/to ss 
load store functional unit 134 as DATA A via the DATA 

ADDR It '"'.''T'^ ^-^^ -^-'^ - addressed ly 
t^n^TA L '^^'"^"^^'^ load/store functional unit isl 
as DATA B via the DATA B data path 

both address the same line of bank 0. then only ^his 
l.ne and bank is accessed and the data at thi loc^ L 

DAtZ Vn^"'^^*'" as bl" 

DATA A and DATA B via the DATA Aand DATA B date 
paths, respectively. "^mim d aate 

diff.r?'!- ^'^^ ^'^^^^^^ ^° '''^ same bank but 
different lines, then the port B access is stalled fo 
one cyde by data cache controller 190. Because date 
cache accesses are generally random, as^mpatd 

s^^ ! h . J?^"^ °^ A- P°rt B accesses to the 
same bank, different lines are relatively low 
1 09 n°'^ ^'^^^sses to date cache 150 are via IAD bus 
102. During a store, multiplexers 474 - 478 use the 
store access to control which of banks 470 - 473 is 55 
written with the 32-bitstore double word. During a re 
load, banks 470 - 473 are written in one 12^ line 
after the reload date has been accumulated in store 
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register 460. 

suits. Upon detecting a cache miss, the revested 
value IS written into an entry of date cache i sn m 
specifically, load store section li!rrnstetes thet'' 
-cal address for the value to a linear ^JdresTrhi-^'?^- 
ear address is provided to memory ^an^g^^^^^^^^^^^^ 
164 The Imear address of the value is cher^rw 
against the linear tag portion of the memo y manfoe 
ment unif s TLB arrav bv a Tl r /.^^ - '"^"age- 
termine whether there's a TLB "it '""^^ *° 

therrwataTB^hrlr^':"'^ that 
mere was a TLB hit. then load store functional unit 

1 34 examines the date to determine whethe^the datl 

450' 4^ h'h''^' '^^^^ 'inear teg aaaT 

If there is not a TLB hit then the TLB array is up- 
dated by memory management unit 1 64 toSde the 
address of the requested value so that a TLB hrt 

:lit?62"ar t^T^' Ph^-ft^ 
P^artg^^^^^^^^^^ - ^ . 

A pre-feteh request is then made by load/store 
functional unit 134 to the externa, memory andThe 
value Which IS stored in the external memory at the 
Physical address Which corresponds to the linea ad! 
dress IS retrieved from the external memory ?hfs val 
ue IS S tored in the bank, line and column of store a^ay 
200 which corresponds to the line and column to^ 
.ons Of the value's linear tegs which are stored in t^e 
Z7T^ r^'^- '^^ <=°'^«sPond'n9 linear tag vaHd 

an-ay 310 are set to indicate that the entry corr^ 
spending to the linear tag is valid, that the I near teo 

When the linear address for this value is aqain rf- 

section 134 converts the logical address to the linear 
address which provides a matah of the linear teas fn 
^near address teg array 310 with the request^ ad 
dress. Because the valid bit is set and the vaUd ohtl 
-cal translation bit is set. a linear address hK occurt 

Ime of date store array 304 is provided to load/store - 

ern":L""^hr ''""■"^ ^^^^ ^^^'^^^ 

phylTc^l a^drlr^ " "° '"^^'^ '^e 

pnysical address tag circuit 162 or TLB circuit lfU 

since the valid physical translation bit is setTlat n^ 

tbat the entry has a valid physical translation ' 
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Referring to Figs. 1 - 10 and Fig. 13. when a load 
opera?orbeingU-edby.oad/st^^^^^^ 
nnit 134 Via port A and the data value to be loaded ,s 
a^ai lal n data cache 150. then a data cache h.t re. 
sufts More specifically, during <t>1 of cyde 1 the 
Tache index, is generated calculated by adder 24o^ 
RSO adder 216; the cache index is the least s gnm 
cant 11 bus of the linear address and is calculated as 

,,«,nriflte line and bank is accessed, the linear au 
Tess whiris calculated by adder 242. is used to ac- 
cLss the appropriate column of store array 200 by 
companng^he linear tags. The data value ,s 0,en re- 
turned to driver circuit 220 of reservation station cir- 
cuit 124 viathe DATA A data path. This data vdue is 
?ol^i^ by driver circuit 220 for providing to the re. 
°rbusO.During<I.2ofcyde1.«mitcheckci^^^^^ 

performs a segment limit ^^^^^J^'^^J^^^^Z 
check as is well known in the art. on the linear ad 
dr^st During Ol of cyde 2. the data value and cor- 
JeT^nding destination tags are driven onto result bus 

° *Th?e^a load operation is performed via porU. a 
corresponding load operation may be performed via 
port B. This corresponding load operation uses reser- 
vation station RSI along with its corresponc^n^ 
to perform the address generation of dat^ <jche 
access. The.data value and corresponding destma 
U^tags for the entry in reservation station RSI are 

operatTbelngUrmedbyload/st^^^^^^^^^ 
unit 1 34 via port A and the data value to be stored is 
arreadTs'ored in data cache 150, then a data cache 
h t results. Because stores are perfomned as read 
mo^y write operation, the first Po^io" o -u,- 
eration is similar to a load operation. After the data 
is loaded, then the loaded -s -^" ^^ 
store buffer circuit 180 in order to modify the loaded 

'"''Morrspedf ically. during Ol of cyde 1 the cache 
index is generated calculated by adder 240 or RSO 
adde 216; the cache index is the least s-gn.f >cant 11 
bfs of the linear address and is calculated as part o 
the l^ear address compute. This cache index linear 
address is used to access the appropriate line and 
bank" data cache store array 200. When the appro- 
priate line and bank is accessed, the 
which is calculated by adder 242. is used to access 
The approp^'e column of store array 200 by compa. 
Z the linear tags. The data value is then retu ned o 
driver drcuit 220 of reservation station circuit 124 via 
he1^^r;datapath.Thisdatava.ueisf— 
driver drcuit 220 for providing to the result bus 0. Dur 
07-2 TcU 1 . .i-it cbedc arcuit ^^'^^ 
segment limit check and a protedion check, as is well 



known in the art, on the linear address. Dunng * of 
cvde 2 the data value and corresponding destination 
tags are driven onto result bus 0 for port A and are also 
stSinthenextavailableentryofstorebuferarc^^ 
, 1 80. This value is held in store buffer circu^ 1 80 unt^l 
the store operation is retired from reorder buffer 114. 
whictoccurs when there are no other instru^ons 
pending. Reorder buffer 114, then indicates to 
Toad/store controller 180, using the load store retire 
,0 signal that the store instruction may be retired, i.e 
t^atthe store may be performed. Because s ores ac- 
tualy modify the state of the data value, stores are 
no spTculatLely performed and must wait unt.. it « 
dear that the store is actually the next instruction be- 
^^e reorder buffer 114 allows the store to be execut- 

After Reorder buffer 1 14 has indicated that the in- 
struction may be perfom^ed. data va^e and cor^ 
r^<;nondinq linear address are dnven to IAD bus luz 
.0 during 010 the cyde fdlowing the release of the .n- 
sSion During *2 of this cycle, the data value is 
wXn to the appropriate line and bankof data c^che 
Ttore array 200. Additionally, if physical tag circurt 162 
ind cati Lt the value should also be wri^^^^ 
.5 aCthrnthedatavalueiswrittentoexternalme^^^^^ 
aUhe physical address location which corresponds to 
thSar address. The physical address translation 
TpLrformed by memory management unit 164 
wh'ch also receives the linear address f rom IAD bus 

""^^ Referring to Figs. 1 - 10 and 15. when a specula- 
twefoad op"e'ration is being performed by load/sto^ 
f national unit 134 and the data value to be loaded is 
nottail^^^^^^^^ cache 150, then a speculative 
35 daacachemissresults.Thefirstcydeofth.1^^^^^^ 
eraUon is the same as if a cache hit had resulted. 

When cache 150 is accessed and the cache miss 
resuUed then during cyde 2. the TLB is accessed 'n 
Temory management unit 1 64 and the phys^^^^^^^^ 

^ are accessed in physical tag circuit 1 ^2 to determme 
the physical address of the data value. This physica 
address is then checked within memory managemen 
it 1 W to conf imi that the physical address does not 
: o atfany protection chedcs. During the nex^ cy^e^ 
if the port B access is not to the same bank of cache 
a?ay 200, then port B initiates another cache access^ 
Add tionally during 02 of this cyde cac e arr^V 200 
is updated with the tag valid bits of the line rom the 
tag buses. During the next cyde, the data v^ue. des^ 
\ tan and status are driven onto the next avail 

" Tb^ resuXs Vnd normal operation, which assumes 

Refl'inTto^^^^^^ 1 -10 and 16. during a cache re- 
.oad^t^cyde^ofthereload operation jthe^^^^^^^ 
ss as i a cache hit had resulted. However, after cache 
con roller 190 determines that a cache miss resulted 
rroad/store functional unit 134 waits store buffer 
circuL llo to empty before accessing external mem- 
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able Signal (L22lS whi^h h '"'^^ ^ ^^'^ ^^^i'" 

array 200. then t^elT Tr"''^" ^'^'^^ ^ 

information is driven on oVer ;".'"°" 
'«*>ervation station circuit 124 

«ated by drlver ctS^^^^^^^^^^^^ 

complete, and the dat hfl h ^'^'^'^^^^^^ 
data is fo matted afdt T ^"^^^'""'ated. the 

220 Of reservation station drc^! 
cesses are only oerfn™! Misaligned ac- 

0. Accordin Jy LTtheTso add' 

tion of driver cir2220^ ?'^"*^'''^'^^*^ 20 
Other Embodim.:.ntc 
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Claim?" ^'"^"^--^^ are within the following 

be ^:^ir:;^:^rT^ 

ft^nctional unit and a sto- f - ' ' 
bodiment. the 003^3110^ ^"^ 



Claims 
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"* load funciona, „ 
Station circuit including reservation 

55 

16 



to the first and second reservatino 
and a second load sign" "the fL^t^^^^^ 
reservation station entries, and 

trolling which of fh/f '°f fo"- con- 

^.na,; r„r,;;r„r ^^^^^ 

entries retrieve and -reservation station 

ond data cache ports tho ^"^^^ sec- 

2. The load functional unit of daim i ..h • 
=a<*e data l„ oaralLl"^ ""^ data 

4. The load f„naio„a, „„„ „, , 

s.a>id„a„,.,.o„,'„rJr,a„Vr<rd°'"''°" 

• The foad f„naio„al uhit o, claim 4 wherein 
oWee a fr«r"°" "rthe, in-.- 

'ourth ri^vatn'Se: ' k'^" 

the third reservain„ . ?. " ""O'*" 1° 

a fourth s 

— ervati„n.,a,ir„;:"rh'err.r:r 
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vation station entry being coupled to the second 
reservation station entry and providing the fourth 
reservation station entry output to the second 
reservation station entry. 

one of the third and second reser- 
vation station entries retrieving the fourth reser- 
vation station entry output under control of the 
load control circuit. 

6 The load functional unit of claim 1 wherein reser- 
vation station circuit further includes 

f irst and second adder circurts coupled to 
the first and second reservation station entnes. 
respectively. ^^^^^ ^^^^^ ^.^^^,3 

receiving the load signals and Providing J^che 
address signals based upon the load s.gnals he 
^che address signals accessing respectrve f .rst 
Tnd second locations within the data cache store 
array. 

7 The load functional unit of daim 6 wherein the 
■ f irst and second adder circuits each include 

a logical address adder for receding a 
plurality of address component signals and pro- 
viding a logical address signal, and _ 

a linear address adder for receding the 
logical address signal and a segment base signal 
and providing a linear address. 

8 The load functional unit of claim? wherein the ad- 
dress component signal includes 

an A operand adder signal, a B operand 
adder signal and a displacement adder signal. 

9 The load functional unit of claim 8 wherein the 
first adder circuit further includes 

an a operand multiplexer circuit for receiv- 
ing an Aoperand signal and a zero signal and pro- 
ving one of these values as the Aoperanda^^^^^^ 
signal in response to address mode control infor- 
mation from the load controller. 

a B operand multiplexer circuit for receiv- 
ing a B operand signal and a misaligned address 
one signal and providing one of these signals as 
the B operand adder signal in response to ad- 
dress mode control information from the load 

controller, and . •» ro 

a displacement multiplexer circuit for re- 
ceiving a displacement signal, a four signal and 
TfL Signal and providing one of these values as 
the displacement adder signal in response to ad _ 
dress mode control information from the load 
controller. 

10 The load functional unit of daim 8 wherein the 
second adder circuit further indudes 

an a operand multiplexer drcuit for receiv- 



ing an Aoperand signal and a zero signal and pro- 
viding one of these values as the Aoperand adder 
signal in response to address mode control infor- 
mation from the load controller, and 

a B operand multiplexer circuit for receiv- 
inq a B operand signal and a misaligned address 
one signal and providing one of these signals as 
the B operand adder signal in response to ad- 
dress mode control information from the load 
controller and where 

a displacement signal, is provided directly 

to the logical address adder. 



11. A store functional unit for performing a store for- 
,5 ' warding operation comprising: 

first and second store buffer entry circuits 
for holding store operations, the second store 
buffer entry being coupled to the first store buffer 
entry and providing a second store buffer entry 
output to the first store buffer entry, the first s ore 
buffer entry being coupled to the second store 
buffer entry and providing a first store buffer en- 
try output to the second store buffer entry; and 
a store contrdler for contrdling whether 
,5 the second store buffer entry drcuit frieves the 

first store buffer entry output to perform a store 
forwarding operation with the first store buffer 

entry output; . 

the store controller being coupled 

30 to the first and second store buffer entry drcuits. 

12- The store functional unit of daim 11 further com- 

"^"^'"^a third store buffer entry drcuit, the third 
35 store buffer entry drcuit being coupled to the sec- 

ond store buffer entry drcuit and providing a third 
store buffer entry output to the second store buf- 
fer entry, the first store buffer entry circuit being 
coupled to the third store buffer entry circuit and 
oroviding a first store buffer entry output to the 
" ^hTrd store buffer entry circuit, and the second 

store buffer entry drcuit being coupled to the 
third store buffer entry drcuit and providing a 
se«,nd store buffer entry output to the third store 
45 buffer entry circuit; 

"The store controller is coupled to the third 
store buffer entry drcuitand controls whether the 

third store buffer entry circuit retrieves either the 
first or second store buffer entry outputs to per- 
Zl a store forwarding operation with the first or 
second store buffer entry outputs. 

1 3. The store functional unit of daim 1 2 further com- 

'"''"^ fourth store buffer entry circuit, the 
fourth store buffer entry circuit being coupled to 
the third store buffer entry circuit and providing a 

17 
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."9 «>.pled ,0 fourth 3,or. buffe e„^l't 

store buffer entry circuit, and the third store buf 
and wherein 

the first or second store buffer entry outputsTo 
perform a store forwarding operation withihe 
first or second store buffer entry oul^uts ' 

I!',t^'°'^'""'^t'°"«'"n'tofclaim 11 whereineach 

.n^dir ^^""'^ ''^^^^ 

store bCfSTntr^r '"^^ ^"^^"'•^ ^ - 

buffer register circuit for holding. 
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a store buffer entry address register for 

a storeVuffI? t^'^'T'"^ P°rtion for hoiding 
store buffer tag entry of the store buffer entry ,o 

16. The store functional unit of daim 1 *; h • . 

a data byte multiplexer circuit for recPivinn 

entry under control of the store controller 

an address byte multiplexer cirmit'fr., 
ce^ng a plurality of address VnaisTd pl^^ 

store controllen and 

alitvof^fl!?'^"''?'^''^'*='''^"'*^°^^«^«'v'"9aplur- 
aiity of tag signals and providino af ^ \ 

the Plurality of tag signals as te ste buffer'. 

entry under control of the store c^ZLr " 

17. A load/store functional unit of a microprocessor. 



allel comprising- ^ a cache m par- 

*dl-,9 rirs, and s«»„d Jo" b„,2' '"^ 

".e oonuol cl„=„i, being c^pfe^.o^'f^' 
ia. Aappara,usfo,p,„e.s*g i„,o,„»fe„. oo„,,^ 

fnrr^- , . ^ 'oad/store functional unit for oer 

the f,rst and second reservation station entS 
be,ng^c.u^^^^^ 

Poranlyholdingstor^~;~ 
crcu. .nduding first and second store b^fer en 
nes for temporarily holding store opera^Lns ai 

easto Of ,,estorebuffer entries beln^corpied 
to at least one of the reservation station entries; 

f.r,^ ' ^ control circuit for control 

hng the reservation station entries and the store 
buffer entries, the control circuit being coupled S 
the reservation station circuit, to the stor^hfff 
circuit and to the data cache. 
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